Comparative analysis and characterization of the chloroplast genome of Krascheninnikovia ceratoides (Amarathaceae): a xerophytic semi-shrub exhibiting drought resistance and high-quality traits

Background Krascheninnikovia ceratoides, a perennial halophytic semi-shrub belonging to the genus Krascheninnikovia (Amarathaceae), possesses noteworthy ecological, nutritional, and economic relevance. This species is primarily distributed across arid, semi-arid, and saline-alkaline regions of the Eurasian continent, encompassing Inner Mongolia, Xinjiang, Qinghai, Gansu, Ningxia, and Tibet. Results We reported the comprehensive chloroplast (cp) genome of K. ceratoides, characterized by a circular conformation spanning 151,968 bp with a GC content of 36.60%. The cp genome encompassed a large single copy (LSC, 84,029 bp), a small single copy (SSC, 19,043 bp), and a pair of inverted repeats (IRs) regions (24,448 bp each). This genome harbored 128 genes and encompassed 150 simple sequence repeats (SSRs). Through comparative analyses involving cp genomes from other Cyclolobeae (Amarathaceae) taxa, we observed that the K. ceratoides cp genome exhibited high conservation, with minor divergence events in protein-coding genes (PCGs) accD, matK, ndhF, ndhK, ycf1, and ycf2. Phylogenetic reconstructions delineated K. ceratoides as the sister taxon to Atriplex, Chenopodium, Dysphania, and Suaeda, thus constituting a robust clade. Intriguingly, nucleotide substitution ratios (Ka/Ks) between K. ceratoides and Dysphania species for ycf1 and ycf2 genes surpassed 1.0, indicating the presence of positive selection pressure on these loci. Conclusions The findings of this study augment the genomic repository for the Amarathaceae family and furnish crucial molecular instruments for subsequent investigations into the ecological adaptation mechanisms of K. ceratoides within desert ecosystems. Supplementary Information The online version contains supplementary material available at 10.1186/s12863-024-01197-y.


Background
Drought stress is one of the most significant limiting factors affecting plant growth, development, and agricultural productivity worldwide [1].As climate change progresses, the increasing frequency and severity of droughts present substantial challenges to global food security and ecosystems [2].To mitigate the impacts of drought on agriculture, it is essential to identify and utilize drought-resistant plant species that can maintain productivity under water-limited conditions [3].Xerophytic plants, which are adapted to arid environments, have evolved a range of morphological, physiological, and molecular adaptations that enable them to with-stand water scarcity [4].
These adaptations allow xerophytes to survive and even thrive in environments where other plant species may struggle or perish [5].Identifying and characterizing the genetic components underlying these drought-resistant adaptations in xerophytic plants can inform plant breeding programs aiming to improve drought resistance in crop plants [6].By incorporating these adaptive traits into crops, it may be possible to enhance their resilience to drought stress and maintain agricultural productivity in the face of an increasingly uncertain climate [7].As a result, studying xerophytic plants and their unique adaptations to water scarcity can provide valuable insights for developing more resilient agricultural systems and ensuring global food security.
Krascheninnikovia ceratoides, belonging to the Amarathaceae family, is a perennial semi-shrub that thrives in arid, semi-arid, and saline-alkaline environments found in western China regions such as Inner Mongolia, Xinjiang, Qinghai, Gansu, Ningxia, and Tibet, as well as neighboring areas across the Eurasian continent [8][9][10].This plant species plays a vital ecological role in desert ecosystems and is extensively utilized for local forage and economic benefits [11].While numerous studies have delved into its ecological, physiological [12], anatomical, genetic, and chemical aspects as a member of the Krascheninnikovia genus, its cp genome remains unexamined.Moreover, no phylogenetic analyses based on entire cp genome have been published to date.
Prior research on K. ceratoides has primarily focused on its ecological and physiological traits [12][13][14][15], anatomical structure of vegetative organs [16], chromosome karyotype [17], application potential [18], chemical components [19], and genetic diversity [20].For instance, Han et al. [20] used the RAPD (Random Amplified Polymorphic DNA) technique to investigate seven Krascheninnikovia species, uncovering significant distinctions and a wealth of genetic diversity among these species.Nonetheless, the cp genome of K. ceratoides has not yet been characterized, and no published phylogenetic analyses based on entire cp genome are available.This study intends to fill this knowledge gap by providing an in-depth understanding of the genetic makeup of this crucial plant species.By examining the cp genome of K. ceratoides and carrying out phylogenetic analyses with the whole cp genome, we aim to further our comprehension of the Krascheninnikovia genus's evolution and diversity while supplying a valuable resource for future ecological and utilization research on K. ceratoides.
The influx of cp genome sequences has led to the development of advanced bioinformatics tools and resources, which are essential for analyzing and interpreting the vast amounts of data generated by next-generation sequencing (NGS) [21].These tools include algorithms for sequence assembly, annotation, and comparative analysis, which facilitate the identification of structural variations, gene content, and functional elements within cp genomes [22].Additionally, these tools enable researchers to investigate phylogenetic relationships among plant species and trace the evolutionary history of cp genomes [23].The expansion of cp genome data has significantly broadened our knowledge of cp genome diversity and evolution [24].This wealth of information has unveiled unique features, such as variations in gene content, gene order, and repeat elements, which offer insights into the adaptive processes and evolutionary forces that have shaped cp genomes over time [25,26].Furthermore, it has highlighted the potential role of cp genomes in plant speciation, adaptation, and population genetics.
Harnessing the potential of cp genetic resources has opened up new avenues in plant breeding, biotechnology, and conservation efforts [27].In plant breeding, cp genomes can serve as a valuable source of genetic markers, which can be employed for the development of improved crop varieties with desirable traits, such as increased yield, resistance to pests and diseases, and enhanced stress tolerance [28].In biotechnology, cp genomes offer opportunities for the engineering of transplastomic plants, which can produce high levels of bioactive compounds, biopharmaceuticals, or functional proteins for industrial, pharmaceutical, or agricultural applications [29].Lastly, the comprehensive knowledge of cp genome diversity and evolution can aid in the conservation of plant biodiversity by providing essential information for the identification of endangered species, the assessment of genetic diversity within and among populations, and the formulation of effective conservation strategies [30].The rapid and cost-effective sequencing of complete cp genomes, enabled by next-generation sequencing technologies and bioinformatics tools, has significantly expanded our understanding of cp genome diversity and evolution [31].This increased knowledge has un-locked new possibilities in plant breeding, biotechnology, and conservation efforts, ultimately contributing to global food security, environmental sustainability, and human well-being.
In this study, we present a comprehensive analysis and characterization of the cp genome of K. ceratoides.Our objectives were to: (1) assemble and annotate the cp genome of K. ceratoides, (2) perform comparative analyses with related species, (3) investigate SSRs, long repeats, and codon usage patterns, and (4) elucidate the phylogenetic relationships among species in the Amarathaceae family.This information will contribute to our understanding of the genetic basis underlying the drought resistance and high-quality traits of K. ceratoides and inform future breeding and conservation efforts.Our results will not only contribute to the development of molecular markers for more extensive Amarathaceae investigations but also offer a theoretical foundation for genetic breeding, population research, and the evaluation of phylogenetic connections within the Amarathaceae family.

Chloroplast genome characteristics of Krascheninnikovia ceratoides
The K. ceratoides cp genome exhibits a typical quadripartite circular structure with a total length of 151,968 bp.This genome is composed of an 84,029 bp LSC region and a 19,043 bp small single copy (SSC) region, separated by a pair of 24,448 bp long IRa and IRb regions.The cp genome's overall GC content is 36.60%,with the LSC, SSC, and the two IR regions having GC contents of 34.80, 30.60, and 42.20%, respectively (Table 1).The K. ceratoides cp genome encodes a total of 128 genes, including 84 PCGs, 36 tRNA genes, and eight rRNA genes (Table 1).Among these genes, 84 are located in the LSC region, 13 in the SSC region, and 31 in the IR regions (Fig. 1 and Table 2).Seven PCGs (petB, petD, atpF, ndhA, rpoC1, rps16, rpl16) and six tRNA genes (trnA-UGC , trnI-CAU , trnK-UUU , trnL-UAA , trnR-ACG , trnV-UAC ) contain one intron, while three PCGs (ndhB, rps12, clpP) and one gene of unknown function (ycf3) have two introns (Tables 1 and 2).
An examination of codon usage in the chloroplast genome of K. ceratoides identified 25,398 codons representing 20 amino acids and stop codons across the 68 PCGs.Leucine was found to be the most frequent codon, accounting for 10.53% of the total codons, followed by isoleucine (8.39%), serine (7.51%), and the least common was cysteine (1.17%) (Table S3).Additionally, a total of 30 degenerate codons were identified with Relative Synonymous Codon Usage (RSCU) values greater than one, indicating a bias in codon usage in K. ceratoides' chloroplast genome.Of all the codons analyzed in K. ceratoides chloroplast genome, the AGA codon encoding arginine showed the strongest usage bias with a value of 1.96.Codons with RSCU values greater than 1.0 mostly ended with A or U at the third position, except for UUG (Fig. 4 and Table S3).

Analysis of IR contraction and expansion
Comparison of Cyclolobeae species cp genomes with that of K. ceratoides indicated that they had comparable numbers and positions of PCGs, tRNAs, and rRNAs, as well as comparable locations of the IR/LSC and IR/SSC boundary regions (Fig. 5).However, notable differences were detected, including expansions or contractions of the IR/LSC and IR/SSC boundary regions.In all Cyclolobeae cp genomes, the rps19 gene crossed over the LSC/ IRb region, but its length varied in the IRb region.The ndhF gene crossed over the IRb/SSC region in seven cp genomes, whereas in K. ceratoides, it was entirely located in the SSC region.The ycf1 gene crossed over the SSC/ IRa region junction in all Cyclolobeae cp genomes, with varying lengths located in the IRa region.Furthermore, the trnH gene was found in the LSC region of all cp genomes.These similarities and differences provide insights into the genetic diversity and evolution of Cyclolobeae species, including K. ceratoides.
The Ka/Ks substitution rates for 76 protein-coding genes in the cp genomes of K. ceratoides and Dysphania species were calculated, and the results showed that 17 genes had a Ka/Ks value of zero or could not be calculated.Out of the remaining 59 genes, four genes (atpA, rps16, ycf1, and ycf2) had Ka/Ks values higher than 1.0, Large subunit of ribosome rpl2 c , rpl14, rpl16 a , rpl20, rpl22, rpl32, rpl33, rpl36 while the Ka/Ks ratios of four other genes (atpE, cemA, clpP, rps7) were between 0.5 to 1.0 in both K. ceratoides and Dysphania species.Among these genes, ycf1 and ycf2 were detected in both K. ceratoides and Dysphania species, while atpA and rps16 genes were only found in D. ambrosioides and D. botrys (Fig. 8).

Discussion
This study delves into the chloroplast (cp) genome of K. ceratoides from the Amarathaceae family.A comprehensive annotation and assembly process revealed insights into its taxonomic relationships and phylogenetic position.The identification of SSRs and genomic hotspots are pivotal for developing molecular markers, enhancing research in evolutionary history, population genetics, and breeding strategies for K. ceratoides and related species.The findings on nucleotide biases and amino acid representation in K. ceratoides' cp genome are consistent with those in other eudicots, augmenting our understanding of genetic diversity and evolutionary connections within the Amarathaceae.This research sets a solid base for future studies on chloroplast genome evolution in this family, offering valuable tools for various genetic and ecological explorations.The cp genome of K. ceratoides shows similarities with other angiosperm species.Like many other eudicot species, K. ceratoides has a trans-splicing rps12 gene.The absence of the ycf15 gene in the K. ceratoides cp genome is consistent with previous findings in species from unrelated orders, such as Acorus, Ceratophyllum, Illicium, and Ranunculus.The overall GC content of the K. ceratoides cp genome is 36.60%,which falls within the range reported for most species within its family.The LSC and SSC regions have lower GC content than the IR region.This difference can be attributed to the presence of eight GC-rich rRNA genes in the IR regions, as observed in previous studies.This genomic organization is typical of many angiosperm cp genomes and supports the idea that K. ceratoides is closely related to other members of its family.
Simple sequence repeats, or microsatellite sequences, are widely used as molecular markers in population genetics and evolutionary research due to their high degree of polymorphism at the species level.In the chloroplast genome of K. ceratoides, researchers have identified 150 SSR sites, which is a greater number than what has been found in other species within the Amarathaceae family.The study also revealed a greater abundance of mononucleotide SSRs compared to di-nucleotide and tri-nucleotide SSRs, which is consistent with previous findings in other Amarathaceae species [32].SSRs with smaller units tend to exhibit increased polymorphism, making them more useful as molecular markers.As a result, the K. ceratoides SSRs, which predominantly consist of shorter units, have significant potential for the development of effective molecular markers.These markers could be employed in future population genetics and evolutionary studies, providing valuable insights into the biology and history of K. ceratoides and related species.
The finding that the amino acid cysteine is the least represented in the K. ceratoides cp genome is significant as it provides insights into the molecular characteristics and evolutionary relationships of this species with other eudicot plants.Cysteine's low representation in the K. ceratoides cp genome is consistent with previous observations in other eudicot species [25,26].This similarity suggests that K. ceratoides shares common genomic features with other eudicots, which can contribute to our understanding of the evolutionary relationships among these plant species.Cysteine is an essential amino acid containing a sulfhydryl group, which can form disulfide bonds that play a crucial role in protein folding and stabilization.The low abundance of cysteine in the cp genome of K. ceratoides might have implications for the structure, Relative Synonymous Codon Usage (RSCU) values were calculated to understand the usage bias of different codons for the same amino acid.30 degenerate codons (codons that code for the same amino acid) showed RSCU values greater than one, indicating a preference for certain codons over others in the K. ceratoides cp genome.The AGA codon, which encodes arginine, displayed the highest usage bias with a value of 1.96.It is noteworthy that all other codons with RSCU values exceeding 1.0 ended in either A (adenine) or U (uracil), except for UUG.This observation suggests that the K. ceratoides cp genome has a bias towards using codons ending in A or U, which is a common pattern observed in chloroplast genomes.Understanding the codon usage patterns and biases in the K. ceratoides cp genome is important for studying gene expression, regulation, and evolution.This information can also be useful for comparative genomics and biotechnology applications, such as designing genes with optimized codon usage for efficient expression in this plant species.
The analysis revealed a preference for codons ending with A/U in the K. ceratoides cp genome.This finding is important because it provides insights into the molecular characteristics and evolutionary relationships of this species with other eudicot plants.The preference for A/U-ending codons in the K. ceratoides cp genome mirrors the A/U nucleotide bias observed in the cp genomes of other eudicots, such as Sinapis (Brassicaceae and Leptodermis (Rubiaceae) [33].This similarity suggests that K. ceratoides shares common genomic features with other eudicot species, which can contribute to our understanding of the evolutionary relationships among these plant species.Codon usage bias, or the unequal usage of synonymous codons for the same Fig. 5 Analyze the boundaries between the LSC/SSC/IR regions and adjacent genes in Cyclolobeae (Amarathaceae), nine cp genomes were compared, including the cp genome of Krascheninnikovia ceratoides amino acid, is a common feature in the genomes of various organisms.This bias can be influenced by factors such as mutational pressures, gene expression level, and selection for translational efficiency.The preference for A/U-ending codons in the K. ceratoides cp genome could have implications for the efficiency and accuracy of protein synthesis in this species, as well as the stability of its RNA structures.Further research on codon usage bias in K. ceratoides and its relationship with other molecular and evolutionary factors could provide valuable insights into the molecular mechanisms and evolutionary forces shaping the biology of this species and related plants.These findings further support the notion that K. ceratoides shares common genomic features with other eudicot species, providing insights into the evolutionary relationships and molecular characteristics of the cp genomes in this group of plants.
Variations in the IR regions, including contractions and expansions, are a common phenomenon in plant cp genomes and can result in differences in genome lengths.These variations can occur in both distantly and closely related species, leading to the generation of pseudogenes, gene duplications, and single-copy gene deletions.Compared to species from other families, such as Lagerstroemia of Lythraceae [34] and Aconitum of Ranunculaceae [35], the IR/SSC boundaries in Cyclolobeae are more conserved.Furthermore, the gene types and arrangements at the IR/SSC boundaries in the cp genome of Cyclolobeae are similar, with one notable exception: the loss of the trnH gene at the IRa/LSC boundary in K. ceratoides.This observation highlights the unique features of K. ceratoides cp genome, providing valuable insights into the evolution and genetic diversity within the Cyclolobeae group and the broader Amarathaceae family.
This study identified mutational hotspots for phylogenetic analyses and intraspecies discrimination at the species level through plastome-wide comparisons [36].Even though the rbcL gene has been widely employed in phylogenetic research of the Amarathaceae family at the inter-familial level, its usefulness may be limited due to its relatively slower evolutionary rates.Instead, noncoding regions of the chloroplast genome may be more suitable for phylogenetic analysis as they tend to have higher mutation rates and greater variability, resulting from reduced selective pressures [37].As a result, several non-coding regions, such as the intergenic spacers atpB-rbcL, ndhF-rpL32, psbB-psbH, and trnL-trnF, have been employed for phylogenetic reconstructions within Amarathaceae [38,39].The comparisons between the cp genome of K. ceratoides and other Amarathaceae species revealed sequence variations mainly within the SSC and LSC regions, with the LSC region being particularly divergent.Among the genes and non-coding regions examined, accD, matK, ndhF, ndhK, ycf1, and ycf2 were identified as the most dissimilar, whereas the non-coding regions situated between accD-psaI, ndhF-trnL, petA-psbJ, psbF-petL, trnC-psbM, trnS-trnG, and ycf2-trnL exhibited the greatest degree of divergence.These genes and spacers can be considered as suitable candidates for developing molecular markers for resolving phylogenetic relationships and species delimitation within the Amarathaceae family.
It was observed that the SSC and LSC regions displayed more nucleotide variability compared to the IR regions in both K. ceratoides and Dysphania species.This finding aligns with the expectation that the IR regions tend to be more conserved as they play a crucial role in preserving the stability of the genome [40].Understanding nucleotide diversity among plant species is important for studying their evolution, genetic structure, and adaptation.The identification of regions that exhibit high levels of variability can prove to be extremely valuable in the creation of molecular markers that can be utilized in population genetics, phylogenetics, and plant breeding.Additionally, this information can aid in the design of conservation strategies and the assessment of genetic resources in these species.
The Ka/Ks ratio in the cp genomes of K. ceratoides and Dysphania species, only two genes, ycf1 and ycf2, showed values exceeding 1.0, indicating the presence of positive selective pressure.While ycf1 and ycf2 were initially identified as plastid-encoded proteins essential for the survival of Chlamydomonas reinhardtii and Nicotiana tabacum, their specific functions remain unclear [41].However, numerous phylogenetic analyses of ycf genes in other species have suggested that positive selection acts on ycf1 and ycf2 [25,26].As a result, further investigation is necessary to determine the functional implications of positive selective pressure on these genes in the context of K. ceratoides, Dysphania species, and other related plants.Understanding the evolutionary forces shaping these genes may provide valuable insights into their functional roles in the biology and adaptation of these species.
The analysis of entire cp genomes in this study confirms that all 25 species of Amarathaceae form a monophyletic group.K. ceratoides is demonstrated to be closely related to Atriplex, Chenopodium, Dysphania, and Suaeda, which is consistent with previous studies based on nuclear ribosomal internal transcribed spacer and cp DNA data.However, it is important to acknowledge that future studies should include additional species to further support these relationships.Addressing Fig. 9 Phylogenetic tree was created based on the chloroplast genomes of 25 species from Amarathaceae, utilizing both BI and ML methods.The nodes with 100% bootstrap support and 1.00 pp. were marked with stars ("★") indicating they were fully supported in both analyses taxonomic gaps in Amarathaceae is crucial for enhancing our understanding of the evolution and relationships within this diverse family.By incorporating more species and analyzing their cp genomes, researchers can gain a more comprehensive view of the phylogenetic relationships, molecular characteristics, and evolutionary history of the Amarathaceae family, contributing to a better understanding of this diverse group of plants.

Conclusions
This study presents the first comprehensive chloroplast (cp) genome sequence of K. ceratoides, a halophytic semi-shrub of ecological, nutritional, and economic significance, that is widely distributed across arid regions of the Eurasian continent.By providing a thorough characterization of the cp genome, including its structure, gene content, and simple sequence repeats, this study expands the genomic resources available for the species.Moreover, it reports novel insights into the evolutionary relationships of K. ceratoides with other members of the Cyclolobeae (Amarathaceae) taxa, positioning it as a sister taxon to Atriplex, Chenopodium, Dysphania, and Suaeda.Additionally, the study uncovers evidence of positive selection pressure on ycf1 and ycf2 genes in K. ceratoides, which is an intriguing observation with potential implications for understanding the adaptive mechanisms of the species in desert ecosystems.The phylogenetic insights obtained from this research can be useful in addressing future taxonomic or nomenclatural challenges related to K. ceratoides.Moreover, the findings of this study provide a crucial resource for subsequent investigations into the genetic diversity and adaptation of K. ceratoides in arid and semi-arid regions across the Eurasian continent.This research contributes to a deeper understanding of the evolution and diversity within the Krascheninnikovia genus, paving the way for future studies on the ecology and utilization of K. ceratoides.The insights gained from this study offer fresh perspectives on the genetic features of K. ceratoides, establishing a foundation for future research focused on the plant's classification, environmental impact, and potential applications.

DNA extraction and sequencing
For this study, fresh leaves of Krascheninnikovia ceratoides were meticulously collected from Golmud City in Qinghai Province, China, located at coordinates 36.89°latitude and 93.14° longitude, with an altitude of 2845 m above sea level.The identification of the samples was conducted based on the morphological criteria outlined in "Flora of China" by Jinyuan Chen [10], ensuring accurate classification.To corroborate the authenticity of our specimens, voucher samples were diligently preserved and cataloged under the collection number QTP-LJQ-CHNR-020-1004.These voucher specimens are currently housed at the Herbarium of the Northwest Plateau Institute of Biology (HNWP), affiliated with the Chinese Academy of Sciences, situated in Xining, Qinghai Province, People's Republic of China.
Total genomic DNA was extracted from the dried leaf tissue using a modified Cetyl Trimethyl Ammonium Bromide method [42] after drying the fresh leaves with silica gel.The quality of DNA samples was assessed through agarose gel electrophoresis, and qualified samples were fragmented using a Covaris Ultrasonic Crusher.The fragments were purified, repaired, and had poly-A tails and sequencing adaptors added before amplification using the NEB Next ® Ultra ™ DNA Library Prep Kit for Illumina (USA).The quality of DNA was assessed, and the samples were sequenced using the Illumina NovaSeq 6000 platform (Novegene Biological Information Technology Co., Ltd., Beijing, China).

Genome assembly and annotation
The raw data was processed using FastQC v0.11.7 [43] and Fastp [44] to acquire high-quality clean reads by eliminating low-quality reads, shorter reads, and adapters.The high-quality reads were subsequently assembled into the complete cp genome of K. ceratoides using NOVOPlasty (version v4.2.1) [45] with the Dysphania botrys (NC_042166) cp genome as a reference.The complete cp genome was annotated using GeSeq [46] and PGA [47] by referencing the cp genomes of Dysphania botrys (NC_042166) and Dysphania pumilio (MK541016).The outcomes were manually examined and adjusted to minimize annotation errors, and the cp genome was submitted to GenBank with accession number OR635666.The cp genome's gene map was visualized using the online tool Chloroplot (available at https:// irsco pe.shiny apps.io/ Chlor oplot/) [48].

Fig. 1
Fig.1Circular map of Krascheninnikovia ceratoides (Amarathaceae) cp genome displays genes on the outer and inner parts of the circle, which are transcribed in counterclockwise and clockwise directions.The gray bars in the inner circle correspond to GC content.The gene names and their codon usage bias are labeled on the outermost layer

Fig. 2
Fig. 2 The variety and length of repeats in the Krascheninnikovia ceratoides cp genome are as follows: A The count of different types of repeat sequences; B The total number of repeat sequences according to their length

Fig. 3
Fig. 3 Examination of SSRs in the Krascheninnikovia ceratoides cp genome: A Categories and quantities of SSRs; B Placement of SSRs within the LSC, SSC, IRA, and IRB regions; C Localization of SSRs in coding sequence, IGS, and intron areas

Table 1
Characteristics of the chloroplast genome of Krascheninnikovia ceratoidesLSC Large single copy region, SSC Small single copy region, IR Inverted repeat

Table 2
Functional categories for the genes present in the chloroplast genome of Krascheninnikovia ceratoides a and b represent an intron and two introns in protein-coding genes, respectively.c represents duplicated genes